SAMZA-2124: Add Beam API doc to the website#948
Conversation
dxichen
left a comment
There was a problem hiding this comment.
Minor comments, thanks for the docs!
|
|
||
| {% endhighlight %} | ||
|
|
||
| To run this Beam program with Samza, you can simply provides "--runner=SamzaRunner" as a program argument. You can follow our [quick start](/startup/quick-start/{{site.version}}/beam.html) to set up your project and run different examples. For more details on writing the Beam program, please refer the comprehensive [Beam programming guide](https://beam.apache.org/documentation/programming-guide/). |
There was a problem hiding this comment.
s/provides/provide
s/refer the comprehensive/refer to the..
| ``` | ||
| $ deploy/examples/bin/run-beam-standalone.sh org.apache.beam.examples.WordCount \ | ||
| --configFilePath=$PWD/deploy/examples/config/standalone.properties \ | ||
| --inputFile=/Users/xiliu/opensource/samza-beam-examples/pom.xml --output=word-counts.txt \ |
There was a problem hiding this comment.
remove username, I have a patch for these docs here apache/samza-beam-examples#1
There was a problem hiding this comment.
I switched to KafkaWordCount, to avoid the batch problems we have.
| ``` | ||
| $ deploy/examples/bin/run-beam-yarn.sh org.apache.beam.examples.WordCount \ | ||
| --configFilePath=$PWD/deploy/examples/config/yarn.properties \ | ||
| --inputFile=/Users/xiliu/opensource/samza-beam-examples/pom.xml \ |
There was a problem hiding this comment.
Fixed by switching to kafka.
|
|
||
| #### Samza SQL API examples | ||
| You can easily create a Samza job declaratively using | ||
| [Samza SQL](https://samza.apache.org/learn/tutorials/0.14/samza-sql.html). |
|
|
||
| ### Apache Beam - A Samza’s Perspective | ||
|
|
||
| The goal of Samza is to provide large-scale streaming processing capabilities with first-class state support. This does not contradict with Beam. In fact, while Samza lays out a solid foundation for large-scale stateful stream processing, Beam adds the cutting-edge stream processing API and model on top of it. The Beam API and model allows further optimization in the Samza platform, including multi-stage distributed computation and parallel processing on the per-key basis. The performance enhancements from these optimizations will benefit both Samza and its users. Samza can also further improve Beam model by providing various use cases. Adopting Beam provides a solid understanding of the latest data processing technology, and we believe Samza will benefit from it. No newline at end of file |
There was a problem hiding this comment.
s/Adopting Beam provides a solid understanding of the latest data processing technology/ Beam provides cutting-edge data processing capabilities.
|
|
||
| ### Introduction | ||
|
|
||
| Apache Beam brings an easy-to-use, but powerful API and model for state-of-art stream and batch data processing with portability across a variety of languages. The Beam API and model has the following characteristics: |
There was a problem hiding this comment.
Minor:
s/but powerful API/ powerful API
There was a problem hiding this comment.
Seems better to keep the but. I removed "," to improve readability.
|
|
||
| - *Simple constructs, powerful semantics*: the whole beam API can be simply described by a `Pipeline` object, which captures all your data processing steps from input to output. Beam SDK supports over [20 data IOs](https://beam.apache.org/documentation/io/built-in/), and data transformations from simple [Map](https://beam.apache.org/releases/javadoc/2.11.0/org/apache/beam/sdk/transforms/MapElements.html) to complex [Combines and Joins](https://beam.apache.org/releases/javadoc/2.11.0/index.html?org/apache/beam/sdk/transforms/Combine.html). | ||
|
|
||
| - *Strong consistency via event-time*: Beam provides advanced [event-time support](https://beam.apache.org/documentation/programming-guide/#watermarks-and-late-data) so you can perform windowing and aggregations based on when the events happen, instead of when they are consumed. The event-time mechanism improves the accuracy of processing results, and has repeatability when reprocessing the same data set. |
There was a problem hiding this comment.
Minor:
- s/instead of when they are consumed/instead of arrival time?
- s/and has repeatability/and guarantees repeatability in results/
|
|
||
| 1. Download and install [Apache Maven](http://maven.apache.org/download.cgi) by following Maven’s [installation guide](http://maven.apache.org/install.html) for your specific operating system. | ||
|
|
||
| 1. A script named "grid" is included in this project which allows you to easily download and install Zookeeper, Kafka, and Yarn. |
There was a problem hiding this comment.
I think all the individual line-items in SetUp(Install JDK, install maven, install grid) are numbered with 1. May be it would better to provide them right ordering.
There was a problem hiding this comment.
Thanks for the catch. install grid shouldn't be marked a 1. I fixed in the update.
|
LGTM, thanks! |
* SAMZA-2124: Add Beam API doc to the website * Address pr feedback
* SAMZA-2124: Add Beam API doc to the website * Address pr feedback
Add beam quick start, examples and api docs.